cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16869510359
by Esteemed Contributor
  • 8976 Views
  • 3 replies
  • 6 kudos

Resolved! How to add I custom logging in Databricks

I want to add custom logs that redirect in the Spark driver logs. Can I use the existing logger classes to have my application logs or progress message in the Spark driver logs.

  • 8976 Views
  • 3 replies
  • 6 kudos
Latest Reply
Kaizen
Contributor III
  • 6 kudos

1) Is it possible to save all the custom logging to its own file? Currently it is being logging with all other cluster logs (see image) 2) Also Databricks it seems like a lot of blank files are also being created for this. Is this a bug? this include...

  • 6 kudos
2 More Replies
PHorniak
by New Contributor II
  • 13025 Views
  • 3 replies
  • 4 kudos

Resolved! AttributeError: 'DataFrame' object has no attribute 'rename'

Hello, I am doing the Data Science and Machine Learning course. The Boston housing has unintuitive column names. I want to rename them, e.g. so 'zn' becomes 'Zoning'. When I run this command: df_bostonLegible = df_boston.rename({'zn':'Zoning'}, axi...

  • 13025 Views
  • 3 replies
  • 4 kudos
Latest Reply
KrunalLathiya
New Contributor II
  • 4 kudos

If df_boston is a DataFrame, but you still face issues, try an alternative syntax: df_boston = df_boston.rename(columns={'zn': 'Zoning'}).Make sure df_boston is a proper DataFrame and you're using a recent version of Pandas.

  • 4 kudos
2 More Replies
User16826994223
by Honored Contributor III
  • 2486 Views
  • 3 replies
  • 1 kudos

TPC -DS test On databricks

If I want to run TPC-DS test on databricks what are the steps involved, do we have already daya available on databricks file system or I have to download or create from somewhere.

  • 2486 Views
  • 3 replies
  • 1 kudos
Latest Reply
aladda
Honored Contributor II
  • 1 kudos

See the spark-sql-perf repo for details on how to run benchmark tests using TPC-DS - https://github.com/databricks/spark-sql-perf

  • 1 kudos
2 More Replies
abhinandan084
by New Contributor III
  • 10371 Views
  • 17 replies
  • 12 kudos

Resolved! Community Edition signup issues

I am trying to sign up for the community edition (https://databricks.com/try-databricks) for use with a databricks academy course. However, I am unable to signup and I receive the following error (image attached). On going to login page (link in ora...

0693f000007OoQjAAK
  • 10371 Views
  • 17 replies
  • 12 kudos
Latest Reply
Kaniz
Community Manager
  • 12 kudos

Please look at this link related to the Community - Edition, which might solve your problem. I appreciate your interest in sharing your Community-Edition query with us. However, at this time, we are not entertaining any Community-Edition questions. W...

  • 12 kudos
16 More Replies
tj-cycyota
by New Contributor III
  • 5575 Views
  • 2 replies
  • 1 kudos

Whats the difference between magic commands %pip and %sh pip

In Databricks you can do either %pipor %sh pipWhats the difference? Is there a recommended approach?

  • 5575 Views
  • 2 replies
  • 1 kudos
Latest Reply
stefnhuy
New Contributor III
  • 1 kudos

Hey there, User16776431030.Great question about those magic commands in Databricks! Let me shed some light on this mystical matter.The %pip and %sh pip commands may seem similar on the surface, but they're quite distinct in their powers. %sh pip is l...

  • 1 kudos
1 More Replies
User15986662700
by New Contributor III
  • 3008 Views
  • 4 replies
  • 1 kudos
  • 3008 Views
  • 4 replies
  • 1 kudos
Latest Reply
User15986662700
New Contributor III
  • 1 kudos

Yes, it is possible to connect databricks to a kerberized hbase cluster. The attached article explains the steps. It consists of setting up a kerberos client using a keytab in the cluster nodes, installing the hbase-spark integration library, and set...

  • 1 kudos
3 More Replies
Madman
by New Contributor II
  • 9790 Views
  • 7 replies
  • 6 kudos

Resolved! Snowflake connection to Databricks error

When I am trying to read snowflake table from my databricks notebook, it is giving the error as:df1.read.format("snowflake") \.options(**options) \.option("query", "select * from abc") \.save()Getting below errorjava.sql.SQLException: No suitable dri...

  • 9790 Views
  • 7 replies
  • 6 kudos
Latest Reply
pdiegop
New Contributor II
  • 6 kudos

@anurag2192 did you managed to solve it?

  • 6 kudos
6 More Replies
User16752245312
by New Contributor III
  • 11414 Views
  • 2 replies
  • 3 kudos

How can I make Databricks API calls from notebook?

Access to Databricks APIs require the user to authenticate. This usually means creating a PAT (Personal Access Token) token. Conveniently, a token is readily available to you when you are using a Databricks notebook.databricksURL = dbutils.notebook....

  • 11414 Views
  • 2 replies
  • 3 kudos
Latest Reply
519320
New Contributor II
  • 3 kudos

hmmm.... no resolution yet? 

  • 3 kudos
1 More Replies
gtaspark
by New Contributor II
  • 41217 Views
  • 8 replies
  • 4 kudos

Resolved! How to get the total directory size using dbutils

Is there a way to get the directory size in ADLS(gen2) using dbutils in databricks? If I run this dbutils.fs.ls("/mnt/abc/xyz") I get the file sizes inside the xyz folder( there are about 5000 files), I want to get the size of the XYZ folder how ca...

  • 41217 Views
  • 8 replies
  • 4 kudos
Latest Reply
User16788316720
New Contributor III
  • 4 kudos

File size is only specified for files. So, if you specify a directory as your source, you have to iterate through the directory. The below snippet should work (and should be faster than the other solutions).import glob   def get_directory_size_in_byt...

  • 4 kudos
7 More Replies
Anonymous
by Not applicable
  • 4326 Views
  • 2 replies
  • 1 kudos
  • 4326 Views
  • 2 replies
  • 1 kudos
Latest Reply
wmespi
New Contributor II
  • 1 kudos

Is this random number not possible to extract from the notebook context? It is available in the browser_hash but that is not populated when running a job.Is this random number static or does it change over time? If it is static, it can then be hardco...

  • 1 kudos
1 More Replies
Aj2
by New Contributor III
  • 3586 Views
  • 4 replies
  • 1 kudos

Resolved! How to connect to DB2-AS400?

What are the steps needed to connect to a DB2-AS400 source to pull data to lake using Databricks? I believe it requires establishing a jdbc connection, but I couldnot find much details online

  • 3586 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Ajay Menon​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
3 More Replies
keenan_jones7
by New Contributor II
  • 9036 Views
  • 3 replies
  • 5 kudos

Cannot create job through Jobs API

import requests import json instance_id = 'abcd.azuredatabricks.net' api_version = '/api/2.0' api_command = '/jobs/create' url = f"https://{instance_id}{api_version}{api_command}" headers = {'Authorization': 'Bearer myToken'} params = { "settings...

  • 9036 Views
  • 3 replies
  • 5 kudos
Latest Reply
rAlex
New Contributor II
  • 5 kudos

@keenan_jones7​ I had the same problem today. It looks like you've copied and pasted the JSON that Databricks displays in the GUI when you select View JSON from the dropdown menu when viewing a job.In order to use that JSON in a request to the Jobs ...

  • 5 kudos
2 More Replies
Erik_L
by Contributor II
  • 1488 Views
  • 3 replies
  • 4 kudos

Resolved! Data size inflates massively while ingesting

GoalImport and consolidate GBs / TBs of local data in 20-mb chunk parquet files into Databricks / Delta lake / partitioned tables.What I've DoneI took a small subset of data, roughly 72.5 GB and ingested using streaming below. The data is already seq...

  • 1488 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Erik Louie​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
2 More Replies
ironising84
by New Contributor II
  • 3317 Views
  • 4 replies
  • 6 kudos

Question on Databricks Spark online proctored exam

Some silly questions folks. I took online proctored Databricks spark certification couple of days back and my unofficial result was pass. I received a mail that it might https://speedtest.vet/ take upto one week to receive the certification, if awar...

  • 3317 Views
  • 4 replies
  • 6 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 6 kudos

better would have been to ask for permission before drinking. I can share my exp. My mobile alarm started buzzing during the exam, I requested the moderator, he then paused the exam and asked me to take my laptop to the mobile and then to switch off,...

  • 6 kudos
3 More Replies
Labels