cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SreedharVengala
by New Contributor III
  • 13411 Views
  • 2 replies
  • 1 kudos

Parsing deeply nested XML in Databricks

Hi Guys,Can someone point me to libraries to parse XML files in Databricks using Python / Scala.Any link to blog / documentations will be helpful.Looked into https://docs.databricks.com/data/data-sources/xml.html.Want to parse XSD's, seem this is exp...

  • 13411 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Sreedhar Vengala​ - I heard back from the team. As you noted, the feature is still experimental and not supported at this time.I would like to assure you that the team is aware of this. I have no information about a time frame to make this a support...

  • 1 kudos
1 More Replies
Jack
by New Contributor II
  • 3699 Views
  • 1 replies
  • 0 kudos

Resolved! Creating Pandas Data Frame of Features After Applying Variance Reduction

I am building a classification model using the following data frame of 120,000 records (sample of 5 records shown):Using this data, I have built the following model:from sklearn.model_selection import train_test_split from sklearn.feature_extraction....

df df3
  • 3699 Views
  • 1 replies
  • 0 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 0 kudos

This is more of a scikit-learn question than a Databricks question. But poking around I think VT_reduced.get_support() is probably what you are looking for:https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold....

  • 0 kudos
Zircoz
by Databricks Partner
  • 15373 Views
  • 2 replies
  • 6 kudos

Resolved! Can we access the variables created in Python in Scala's code or notebook ?

If I have a dict created in python on a Scala notebook (using magic word ofcourse):%python d1 = {1: "a", 2:"b", 3:"c"}Can I access this d1 in Scala ?I tried the following and it returns d1 not found:%scala println(d1)

  • 15373 Views
  • 2 replies
  • 6 kudos
Latest Reply
cpm1
New Contributor II
  • 6 kudos

Martin is correct. We could only access the external files and objects. In most of our cases, we just use temporary views to pass data between R & Python.https://docs.databricks.com/notebooks/notebooks-use.html#mix-languages

  • 6 kudos
1 More Replies
irfanaziz
by Contributor II
  • 2995 Views
  • 1 replies
  • 1 kudos

Resolved! How to keep the original Swedish/Finnish character in the file?

The files are in ANSI format as it shows in the notepad, I could manuelly convert the files to utf8 and read it. But the files are really large. I dont want to download and upload the files. Is there way so i could keep the swedish/finnish characte...

  • 2995 Views
  • 1 replies
  • 1 kudos
Latest Reply
irfanaziz
Contributor II
  • 1 kudos

So the answer was using the option("charset", "iso-8859-1")

  • 1 kudos
SindhuG
by New Contributor
  • 2202 Views
  • 0 replies
  • 0 kudos

Hi All, I need to extract rows of dates from a dataframe based on list of values(e.g. dates) located in a CSV file. Can anyone please help me? I have tried groupby function but am not able to get the expected result. Thanks in advance.

my dataframe looks like this.df = Datecolumn2column3Machine1-jan-2020A2-jan-2020--- A 18-jan-2020 A 11-jan-2020 B 12-jan-2020 B 6-feb-2020C7-feb-2020---C14-feb-2020C Date details csv file looks like this D = MachineSelected DateA15-jan-2020C12-f...

  • 2202 Views
  • 0 replies
  • 0 kudos
vishavgupta988
by New Contributor
  • 6249 Views
  • 2 replies
  • 0 kudos

How to set font-size of values in each cell of dataframe?

I am working on pandas and python.After processing a particular dataframe in my program , I am appending that dataframe below an existing Excel file. Now problem is my excel has font size of 11 pt but dataframe has font size of 12 pt.I want to set f...

  • 6249 Views
  • 2 replies
  • 0 kudos
Latest Reply
DominicFHelms
New Contributor II
  • 0 kudos

I like sharp fonts.

  • 0 kudos
1 More Replies
MikeBrewer
by New Contributor II
  • 22728 Views
  • 3 replies
  • 0 kudos

Am trying to use SQL, but createOrReplaceTempView("myDataView")​ fails

Am trying to use SQL, but createOrReplaceTempView("myDataView") fails. I can create and display a DataFrame fine... import pandas as pd df = pd.DataFrame(['$3,000,000.00','$3,000.00', '$200.5', '$5.5'], columns = ['Amount']) df I add another cell, ...

  • 22728 Views
  • 3 replies
  • 0 kudos
Latest Reply
sachinthana
New Contributor II
  • 0 kudos

This is worked for me. Thank you @acorson​ 

  • 0 kudos
2 More Replies
User16826987838
by Databricks Employee
  • 2518 Views
  • 1 replies
  • 0 kudos
  • 2518 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Using cluster tags we can get the cluster name spark.conf.get("spark.databricks.clusterUsageTags.clusterName")

  • 0 kudos
User16790091296
by Databricks Employee
  • 2445 Views
  • 1 replies
  • 0 kudos

How to read a Databricks table via Databricks api in Python?

Using Python-3, I am trying to compare an Excel (xlsx) sheet to an identical spark table in Databricks. I want to avoid doing the compare in Databricks. So I am looking for a way to read the spark table via the Databricks api. Is this possible? How c...

  • 2445 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

What is the format of the table - if It is delta, you could use the python bindings for the native Rust API and read the table from your python code and compare bypassing the metastore.

  • 0 kudos
alexott
by Databricks Employee
  • 6840 Views
  • 2 replies
  • 0 kudos

How I can test my Python code that I wrote using notebooks?

I've written the code in the notebooks using the Python, and I want to add tests to it to make sure that it won't break when I do more changes.What tools can I use for that tasks?

  • 6840 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

@Alex Ott​ has an awesome answer!Here is a great blog from our engineering team that may help as well. https://databricks.com/blog/2020/01/16/automate-deployment-and-testing-with-databricks-notebook-mlflow.html

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 2165 Views
  • 2 replies
  • 1 kudos

What Databricks Runtime will I have to use if I want to leverage Python 2?

I have some code which is dependent on python 2. I am not able to use Python 2 with Databricks runtime 6.0.

  • 2165 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16826994223
Databricks Employee
  • 1 kudos

When you create a Databricks Runtime 5.5 LTS cluster by using the workspace UI, the default is Python 3. You have the option to specify Python 2. If you use the Databricks REST API to create a cluster using Databricks Runtime 5.5 LTS, the default is ...

  • 1 kudos
1 More Replies
Srikanth_Gupta_
by Databricks Employee
  • 4918 Views
  • 2 replies
  • 0 kudos
  • 4918 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Temp Views and Global Temp Views are the most common way of sharing data across languages within a Notebook/Cluster

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 2694 Views
  • 1 replies
  • 0 kudos
  • 2694 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

You can use libraries such as Seaborn, Bokeh, Matplotlib, Plotly for visualization inside of Python notebooks. See https://docs.databricks.com/notebooks/visualizations/index.html#visualizations-in-pythonAlso, Databricks has its own built-in visualiza...

  • 0 kudos
Labels