cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cmilligan
by Contributor II
  • 2326 Views
  • 3 replies
  • 2 kudos

Dropdown for parameters in a job

I want to be able to denote the type of run from a predetermined list of values that a user can choose from when kicking off a run using different parameters. Our team does standardized job runs on a weekly cadence but can have timeframes that change...

  • 2326 Views
  • 3 replies
  • 2 kudos
Latest Reply
dev56
New Contributor II
  • 2 kudos

Hi @cmilligan , I have a similar requirement and would really be grateful if you could provide me with any information on how to fix this issue. Thanks a lot!

  • 2 kudos
2 More Replies
mbejarano89
by New Contributor III
  • 1610 Views
  • 2 replies
  • 0 kudos

Running a K-means (.fit) gives error:Params must be either a param map or a list/tuple of param maps but got %s." % type(params)

 am running a k-means algorithm. My feature are DoubleType and have no nulls, but I get : raise TypeError("Params must be either a param map or a list/tuple of param maps but got %s." % type(params). Anyone have any idea how to solve this?File /datab...

  • 1610 Views
  • 2 replies
  • 0 kudos
Latest Reply
mbejarano89
New Contributor III
  • 0 kudos

I found the answer just by trying several things, although I do not understand exactly what the problem was. All I had to do was to cache the input data before fitting the model:assemble=VectorAssembler(inputCols=columns_input, outputCol='features')...

  • 0 kudos
1 More Replies
kll
by New Contributor III
  • 7759 Views
  • 2 replies
  • 3 kudos

Nested struct type not supported pyspark error

I am attempting to apply a function to a pyspark DataFrame and save the API response to a new column and then parse using `json_normalize`. This works fine in pandas, however, I run into an exception with `pyspark`.  import pyspark.pandas as ps   i...

  • 7759 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Keval Shah​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 3 kudos
1 More Replies
PraveenC
by New Contributor II
  • 2268 Views
  • 4 replies
  • 3 kudos

[Databricks][JDBC](10400) Invalid type for data - column: 10, type: Array

Getting below error while mapping an Array Column to String[] entity. Please suggest if Databricks JDBC support entity mapping of Array Values [Worked the same code for below config - H2 DB version - 2.1.214 and org.hibernate.dialect.H2Dialect - ...

  • 2268 Views
  • 4 replies
  • 3 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 3 kudos

Hello @Emmanuel Trindade​  @Praveen C​  This does not look like coming from Databricks end. Look at the error thread.javax.persistence.PersistenceException: org.hibernate.exception.DataException: Could not read entity state from ResultSet : EntityKey...

  • 3 kudos
3 More Replies
Philearner
by New Contributor II
  • 1808 Views
  • 3 replies
  • 3 kudos

Unable to find input by typing input in the Multiselect Widget

In the AWS databricks widgets.multiselect, I'm unable to find input by typing input in the mulitselect bar. It was working before. Although I can find the inputs by scrolling down the list, it's annoying if the list is long.​​Here's my script:measlis...

databrick widget problem databrick widget problem 2
  • 1808 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Philip Teu​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
2 More Replies
MShee
by New Contributor II
  • 1157 Views
  • 1 replies
  • 1 kudos
  • 1157 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Honored Contributor
  • 1 kudos

Hello @M Shee​ ,In a drop down you can select a value from a list of provided values, not type the values in. What you might be interested in is a combobox - It is combination of text and dropdown. It allows to select a value from a provided list or ...

  • 1 kudos
jonathan-dufaul
by Valued Contributor
  • 1459 Views
  • 2 replies
  • 2 kudos

Resolved! Does anyone have a single example of a graphframe with two+ types of vertices? (e.g. user and post, not user to user)

I have gone through about 75 pages and every single example has only relationships from one type of object to the same type of object. about 90% have the exact same example of "Alice Bob" "friends."Has anyone ever made a graphframe with two types of ...

  • 1459 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

I feel your pain,I once tried to use graphframes to flatten a complex tree, ended up using graphX (which is even worse to use but at least it is more flexible).So maybe take a look at graphX? Beware, it is terrible to use.I wonder what happened to m...

  • 2 kudos
1 More Replies
Panna
by New Contributor II
  • 1365 Views
  • 2 replies
  • 3 kudos

Is there only one element type option for an array?

I'm creating an array which contains both string and double, just wondering if I can have multiple element type options for one array column? Thanks

  • 1365 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Panna Pan​ , We haven’t heard from you on the last response from @Debayan Mukherjee​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to...

  • 3 kudos
1 More Replies
danny_edm
by New Contributor
  • 518 Views
  • 0 replies
  • 0 kudos

collect_set wired result when Proton enable

Cluster : DBR 10.4 LTS with protonSample schemaseq_no (decimal)type (string)Sample dataseq_no type1 A1 A2 A2 B2 Bcommand : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))...

  • 518 Views
  • 0 replies
  • 0 kudos
cmotla
by New Contributor III
  • 1729 Views
  • 3 replies
  • 8 kudos

Issue with complex json based data frame select

We are getting the below error when trying to select the nested columns (string type in a struct) even though we don't have more than a 1000 records in the data frame. The schema is very complex and has few columns as struct type and few as array typ...

  • 1729 Views
  • 3 replies
  • 8 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 8 kudos

Hi @Chaitanya Motla​ , Just a friendly follow-up. Do you still need help, or did you find the solution? Please let us know.

  • 8 kudos
2 More Replies
NAS
by New Contributor III
  • 1291 Views
  • 1 replies
  • 1 kudos

Resolved! "import pandas as pd" => [Errno 5]

When I type import pandas as pdfrom a Notebook in a Repo I get:--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/lib/python3.8/importlib/_boots...

  • 1291 Views
  • 1 replies
  • 1 kudos
Latest Reply
NAS
New Contributor III
  • 1 kudos

Thanks to Elliott Hertz, I found out that the ML Experiments cannot be stored in the repo. After I moved them to my Workspace everything seems to work.

  • 1 kudos
sravan_enukonda
by New Contributor II
  • 2080 Views
  • 3 replies
  • 2 kudos

Resolved! I am looking for best practices in implementing Ranger type of Access control in Databricks ?

Need this to do auditing and numbers of users accessing databases and tables created in databricks

  • 2080 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @sravankumar enukonda​ , How are you?

  • 2 kudos
2 More Replies
frank26364
by New Contributor III
  • 11269 Views
  • 4 replies
  • 0 kudos

Resolved! Command prompt won't let me type the Databricks token

Hi, I am trying to set up Databricks CLI using the command prompt on my computer. I downloaded the Python 3.9 app and successfully ran the command pip install databricks-cliWhen I try to set up the Databricks token, I am able to type my Databricks Ho...

  • 11269 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there! You're on a roll today! Thanks for letting us know.

  • 0 kudos
3 More Replies
AzureDatabricks
by New Contributor III
  • 7071 Views
  • 7 replies
  • 2 kudos

Resolved! Can we store 300 million records and what is the preferable compute type and config?

How we can persist 300 million records? What is the best option to persist data databricks hive metastore/Azure storage/Delta table?What is the limitations we have for deltatables of databricks in terms of data?We have usecase where testers should be...

  • 7071 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

You can certainly store 300 million records without any problem.The best option kinda depends on the use case. If you want to do a lot of online querying on the table, I suggest using delta lake, which is optimeized (using z-order, bloom filter, par...

  • 2 kudos
6 More Replies
Anonymous
by Not applicable
  • 928 Views
  • 1 replies
  • 0 kudos
  • 928 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

For Delta in general having Delta cache accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote locatio...

  • 0 kudos
Labels