cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aimas
by New Contributor III
  • 13523 Views
  • 8 replies
  • 5 kudos

Resolved! error creating tables using UI

Hi, i try to create a table using UI, but i keep getting the error "error creating table <table name> create a cluster first" even when i have a cluster alread running. what is the problem?

  • 13523 Views
  • 8 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 5 kudos

Be sure that cluster is selected (arrow in database) and at least there is Default database.

  • 5 kudos
7 More Replies
Orianh
by Valued Contributor II
  • 32338 Views
  • 11 replies
  • 10 kudos

Resolved! Read JSON files from the s3 bucket

Hello guys, I'm trying to read JSON files from the s3 bucket. but no matter what I try I get Query returned no result or if I don't specify the schema I get unable to infer a schema.I tried to mount the s3 bucket, still not works.here is some code th...

  • 32338 Views
  • 11 replies
  • 10 kudos
Latest Reply
Prabakar
Databricks Employee
  • 10 kudos

Please refer to the doc that helps you to read JSON. If you are getting this error the problem should be with the JSON schema. Please validate it.As a test, create a simple JSON file (you can get it on the internet), upload it to your S3 bucket, and ...

  • 10 kudos
10 More Replies
Data_Bricks1
by New Contributor III
  • 7019 Views
  • 7 replies
  • 0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

  • 7019 Views
  • 7 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

  • 0 kudos
6 More Replies
StephanieAlba
by Databricks Employee
  • 3246 Views
  • 1 replies
  • 6 kudos
  • 3246 Views
  • 1 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 6 kudos

Hi as it is transaction tables (there are history commits and snapshot). I would not store there images or videos as it can be saved few times and you will have high storage costs, it can also be slow when data is big.I would definitely store images,...

  • 6 kudos
yitao
by New Contributor III
  • 4783 Views
  • 4 replies
  • 10 kudos

Resolved! How to make sparklyr extension work with Databricks runtime?

Hello. I'm the current maintainer of sparklyr (a R interface for Apache Spark) and a few sparklyr extensions such as sparklyr.flint.Sparklyr was fortunate to receive some contribution from Databricks folks, which enabled R users to run `spark_connect...

  • 4783 Views
  • 4 replies
  • 10 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 10 kudos

Yes, as Sebastian said. Also, it would be good to know what the error is here. One possible explanation is that the JARs are not copied to the executor nodes. This would be solved by Sebasitian's suggestion.

  • 10 kudos
3 More Replies
User16826994223
by Databricks Employee
  • 7543 Views
  • 2 replies
  • 1 kudos

AssertionError: assertion failed: Unable to delete the record but I am able to select it though

Is there any reason this command works well:%sql SELECT * FROM datanase.table WHERE salary > 1000returning 2 rows, while the below:%sql delete FROM datanase.table WHERE salary > 1000ErrorError in SQL statement: AssertionError: assertion failed:...

  • 7543 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16826994223
Databricks Employee
  • 1 kudos

DELETE FROM (and similarly UPDAT. aren't supported on the Parquet files - right now on Databricks, it's supported for Delta format. You can convert your parquet files into delta using CONVERT TO DELTA, and then this command will work for you.

  • 1 kudos
1 More Replies
dataslicer
by Contributor
  • 14363 Views
  • 4 replies
  • 4 kudos

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Running Azure Databricks Enterprise DBR 8.3 ML running on a single node, with Python notebook. I have 2 small Spark dataframes that I am able source via credential passthrough reading from ADLSgen2 via `abfss://` method and display the full content ...

  • 14363 Views
  • 4 replies
  • 4 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 4 kudos

Modern Spark operates by a design choice to separate storage and compute. So saving a csv to the river's local disk doesn't make sense for a few reasons:the worker nodes don't have access to the driver's disk. They would need to send the data over to...

  • 4 kudos
3 More Replies
PaulHernandez
by New Contributor II
  • 35594 Views
  • 7 replies
  • 0 kudos

Resolved! How to show an image in a notebook using html?

Hi everyone, I just learning how to personalize the databricks notebooks and would like to show a logo in a cell. I installed the databricks cli and was able to upload the image file to the dbfs: I try to display it like this: displayHTML("<im...

0693f000007OoKMAA0 0693f000007OoKNAA0
  • 35594 Views
  • 7 replies
  • 0 kudos
Latest Reply
_robschaper
New Contributor II
  • 0 kudos

@Paul Hernandez​ @Sean Owen​ @Navneet Tuteja​ I solved this after I also ran into the same issue where my notebook suddenly wouldn't show an image sitting on the driver in an accessible folder - no matter what I was trying in the notebook the display...

  • 0 kudos
6 More Replies
daindana
by New Contributor III
  • 15836 Views
  • 3 replies
  • 3 kudos

Resolved! Why doesn't my notebook display widgets when I use 'dbutils' while it is displayed with '%sql CREATE WIDGET'?

The widget is not shown when I use dbutils while it works perfect with sql.For example, %sql   CREATE WIDGET TEXT state DEFAULT "CA"This one shows me widget.dbutils.widgets.text("name", "Brickster", "Name") dbutils.widgets.multiselect("colors", "oran...

dbutils get info from widget dbutils widget creation
  • 15836 Views
  • 3 replies
  • 3 kudos
Latest Reply
daindana
New Contributor III
  • 3 kudos

Hello, Ryan! For some reason, this problem is solved, and now it is working perfectly! I did nothing new, but it is just working now. Thank you!:)

  • 3 kudos
2 More Replies
BorislavBlagoev
by Databricks Partner
  • 8258 Views
  • 4 replies
  • 4 kudos

Resolved! Databricks writeStream checkpoint

I'm trying to execute this writeStream data_frame.writeStream.format("delta") \ .option("checkpointLocation", checkpoint_path) \ .trigger(processingTime="1 second") \ .option("mergeSchema", "true") \ .o...

  • 8258 Views
  • 4 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 4 kudos

You can remove that folder so it will be recreated automatically.Additionally every new job run should have new (or just empty) checkpoint location.You can add in your code before running streaming:dbutils.fs.rm(checkpoint_path, True)Additionally you...

  • 4 kudos
3 More Replies
halfwind22
by Databricks Partner
  • 15006 Views
  • 9 replies
  • 10 kudos

Resolved! Unable to write csv files to Azure BLOB using pandas to_csv ()

I am using a Py function to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.My GET endpoint takes 2 query parameters,param1 and param2. So initially, I have a dataframe paramDf that has two columns param1 and ...

  • 15006 Views
  • 9 replies
  • 10 kudos
Latest Reply
halfwind22
Databricks Partner
  • 10 kudos

@Hubert Dudek​ I cant issue a spark command to executor node, throws up an error ,because foreach distributes the processing.

  • 10 kudos
8 More Replies
ItsMe
by New Contributor II
  • 6280 Views
  • 3 replies
  • 7 kudos

Resolved! Run Pyspark job of Python egg package using spark submit on databricks

Error: missing application resource​Getting this error while running job with spark submit.​ I have given following parameters while creating job:--conf spark.yarn.appMasterEnv.PYSAPRK_PYTHON=databricks/path/python3--py-files dbfs/path/to/.egg job_m...

  • 6280 Views
  • 3 replies
  • 7 kudos
Latest Reply
User16752246494
Databricks Employee
  • 7 kudos

Hi,We tried a simulate the question on our end and what we did was packaged a module inside a whl file.Now to access the wheel file we created another python file test_whl_locally.py. Inside test_whl_locally.py to access the content of the wheel file...

  • 7 kudos
2 More Replies
BorislavBlagoev
by Databricks Partner
  • 6021 Views
  • 1 replies
  • 5 kudos

Resolved! Get package from Nexus repo.

I want to receive a package from Nexus repo both in notebook and job. If anyone has experience with this, please answer me here!

  • 6021 Views
  • 1 replies
  • 5 kudos
Latest Reply
User16855813973
Databricks Employee
  • 5 kudos

Nexus repo for the notebook you can use Notebook-scoped libraries with %pip with Use %pip install with the --index-url option. Secret management is available. See example.from UI it is not supported Cluster libraries

  • 5 kudos
Labels