cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

philip
by New Contributor
  • 7064 Views
  • 2 replies
  • 2 kudos

Resolved! current date as default in a widget while scheduling the notebook

I have a scheduled a notebook. can I keep current date as default in widget whenever the notebook run and also i need the flexibility to change the widget value to any other date based on the ad hoc run that I do.

  • 7064 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

So building on the answer of Hubert:from datetime import datedate_for_widget = date.today()So if you use date_for_widget as your default value, you are there.And ofc you can fill this date_for_widget variable with anything you want.You can even fetch...

  • 2 kudos
1 More Replies
MudassarA
by New Contributor II
  • 15399 Views
  • 4 replies
  • 1 kudos

Resolved! How to fix TypeError: __init__() got an unexpected keyword argument 'max_iter'?

# Create the model using sklearn (don't worry about the parameters for now): model = SGDRegressor(loss='squared_loss', verbose=0, eta0=0.0003, max_iter=3000) Train/fit the model to the train-part of the dataset: odel.fit(X_train, y_train) ERROR: Typ...

  • 15399 Views
  • 4 replies
  • 1 kudos
Latest Reply
Fantomas_nl
New Contributor II
  • 1 kudos

Replacing max_iter with n_iter resolves the error. Thnx! It is a bit unusual to expect errors like this with this type of solution from Microsoft. As if it could not be prevented..

  • 1 kudos
3 More Replies
Artem_Y
by Databricks Employee
  • 2538 Views
  • 1 replies
  • 2 kudos

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Show all distinct values per column in dataframeProblem Statement:I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.Use this code to show th...

collect set table
  • 2538 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Artem Yevtushenko​ - This is great! Thank you for sharing!

  • 2 kudos
aimas
by New Contributor III
  • 8150 Views
  • 8 replies
  • 5 kudos

Resolved! error creating tables using UI

Hi, i try to create a table using UI, but i keep getting the error "error creating table <table name> create a cluster first" even when i have a cluster alread running. what is the problem?

  • 8150 Views
  • 8 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Be sure that cluster is selected (arrow in database) and at least there is Default database.

  • 5 kudos
7 More Replies
Orianh
by Valued Contributor II
  • 25731 Views
  • 11 replies
  • 10 kudos

Resolved! Read JSON files from the s3 bucket

Hello guys, I'm trying to read JSON files from the s3 bucket. but no matter what I try I get Query returned no result or if I don't specify the schema I get unable to infer a schema.I tried to mount the s3 bucket, still not works.here is some code th...

  • 25731 Views
  • 11 replies
  • 10 kudos
Latest Reply
Prabakar
Databricks Employee
  • 10 kudos

Please refer to the doc that helps you to read JSON. If you are getting this error the problem should be with the JSON schema. Please validate it.As a test, create a simple JSON file (you can get it on the internet), upload it to your S3 bucket, and ...

  • 10 kudos
10 More Replies
Data_Bricks1
by New Contributor III
  • 4018 Views
  • 7 replies
  • 0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

  • 4018 Views
  • 7 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

  • 0 kudos
6 More Replies
StephanieAlba
by Databricks Employee
  • 1983 Views
  • 1 replies
  • 6 kudos
  • 1983 Views
  • 1 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Hi as it is transaction tables (there are history commits and snapshot). I would not store there images or videos as it can be saved few times and you will have high storage costs, it can also be slow when data is big.I would definitely store images,...

  • 6 kudos
yitao
by New Contributor III
  • 3233 Views
  • 4 replies
  • 10 kudos

Resolved! How to make sparklyr extension work with Databricks runtime?

Hello. I'm the current maintainer of sparklyr (a R interface for Apache Spark) and a few sparklyr extensions such as sparklyr.flint.Sparklyr was fortunate to receive some contribution from Databricks folks, which enabled R users to run `spark_connect...

  • 3233 Views
  • 4 replies
  • 10 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 10 kudos

Yes, as Sebastian said. Also, it would be good to know what the error is here. One possible explanation is that the JARs are not copied to the executor nodes. This would be solved by Sebasitian's suggestion.

  • 10 kudos
3 More Replies
User16826994223
by Honored Contributor III
  • 6185 Views
  • 2 replies
  • 1 kudos

AssertionError: assertion failed: Unable to delete the record but I am able to select it though

Is there any reason this command works well:%sql SELECT * FROM datanase.table WHERE salary > 1000returning 2 rows, while the below:%sql delete FROM datanase.table WHERE salary > 1000ErrorError in SQL statement: AssertionError: assertion failed:...

  • 6185 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

DELETE FROM (and similarly UPDAT. aren't supported on the Parquet files - right now on Databricks, it's supported for Delta format. You can convert your parquet files into delta using CONVERT TO DELTA, and then this command will work for you.

  • 1 kudos
1 More Replies
dataslicer
by Contributor
  • 9901 Views
  • 4 replies
  • 4 kudos

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Running Azure Databricks Enterprise DBR 8.3 ML running on a single node, with Python notebook. I have 2 small Spark dataframes that I am able source via credential passthrough reading from ADLSgen2 via `abfss://` method and display the full content ...

  • 9901 Views
  • 4 replies
  • 4 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 4 kudos

Modern Spark operates by a design choice to separate storage and compute. So saving a csv to the river's local disk doesn't make sense for a few reasons:the worker nodes don't have access to the driver's disk. They would need to send the data over to...

  • 4 kudos
3 More Replies
PaulHernandez
by New Contributor II
  • 24395 Views
  • 7 replies
  • 0 kudos

Resolved! How to show an image in a notebook using html?

Hi everyone, I just learning how to personalize the databricks notebooks and would like to show a logo in a cell. I installed the databricks cli and was able to upload the image file to the dbfs: I try to display it like this: displayHTML("<im...

0693f000007OoKMAA0 0693f000007OoKNAA0
  • 24395 Views
  • 7 replies
  • 0 kudos
Latest Reply
_robschaper
New Contributor II
  • 0 kudos

@Paul Hernandez​ @Sean Owen​ @Navneet Tuteja​ I solved this after I also ran into the same issue where my notebook suddenly wouldn't show an image sitting on the driver in an accessible folder - no matter what I was trying in the notebook the display...

  • 0 kudos
6 More Replies
daindana
by New Contributor III
  • 11227 Views
  • 3 replies
  • 3 kudos

Resolved! Why doesn't my notebook display widgets when I use 'dbutils' while it is displayed with '%sql CREATE WIDGET'?

The widget is not shown when I use dbutils while it works perfect with sql.For example, %sql   CREATE WIDGET TEXT state DEFAULT "CA"This one shows me widget.dbutils.widgets.text("name", "Brickster", "Name") dbutils.widgets.multiselect("colors", "oran...

dbutils get info from widget dbutils widget creation
  • 11227 Views
  • 3 replies
  • 3 kudos
Latest Reply
daindana
New Contributor III
  • 3 kudos

Hello, Ryan! For some reason, this problem is solved, and now it is working perfectly! I did nothing new, but it is just working now. Thank you!:)

  • 3 kudos
2 More Replies
BorislavBlagoev
by Valued Contributor III
  • 5340 Views
  • 4 replies
  • 4 kudos

Resolved! Databricks writeStream checkpoint

I'm trying to execute this writeStream data_frame.writeStream.format("delta") \ .option("checkpointLocation", checkpoint_path) \ .trigger(processingTime="1 second") \ .option("mergeSchema", "true") \ .o...

  • 5340 Views
  • 4 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

You can remove that folder so it will be recreated automatically.Additionally every new job run should have new (or just empty) checkpoint location.You can add in your code before running streaming:dbutils.fs.rm(checkpoint_path, True)Additionally you...

  • 4 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels